In [37]:
pricing = get_pricing(['AAPL', 'MSFT'], 
                      start_date='2014-01-01', 
                      end_date='2014-01-07', 
                      frequency='minute',
                      fields='price')

pricing is a DataFrame with the same structure as the return value of history on quantopian.


In [38]:
pricing.head(10)


Out[38]:
Security(24 [AAPL]) Security(5061 [MSFT])
2014-01-02 14:31:00+00:00 79.446 37.340
2014-01-02 14:32:00+00:00 79.424 37.375
2014-01-02 14:33:00+00:00 79.490 37.260
2014-01-02 14:34:00+00:00 79.502 37.260
2014-01-02 14:35:00+00:00 79.252 37.280
2014-01-02 14:36:00+00:00 79.184 37.283
2014-01-02 14:37:00+00:00 79.260 37.270
2014-01-02 14:38:00+00:00 79.300 37.300
2014-01-02 14:39:00+00:00 79.259 37.300
2014-01-02 14:40:00+00:00 79.222 37.280

Pandas' built-in groupby and apply operations are extremely powerful. For more information on these features, see http://pandas.pydata.org/pandas-docs/stable/groupby.html.


In [39]:
from pandas.tseries.tools import normalize_date

def my_grouper(ts):
    "Function to apply to the index of the DataFrame to break it into groups."
    # Returns midnight of the supplied date.
    return normalize_date(ts)


def first_thirty_minutes(frame):
    "Function to apply to the resulting groups."
    return frame.iloc[:30]

The result of a groupby computation is a Hierarchichally-Indexed DataFrame where the outermost layer of the index is the groupby key, and the secondary layers are the values from the frame's original index.


In [40]:
data = pricing.groupby(my_grouper).apply(first_thirty_minutes)
data.head(40)


Out[40]:
Security(24 [AAPL]) Security(5061 [MSFT])
2014-01-02 00:00:00+00:00 2014-01-02 14:31:00+00:00 79.446 37.3400
2014-01-02 14:32:00+00:00 79.424 37.3750
2014-01-02 14:33:00+00:00 79.490 37.2600
2014-01-02 14:34:00+00:00 79.502 37.2600
2014-01-02 14:35:00+00:00 79.252 37.2800
2014-01-02 14:36:00+00:00 79.184 37.2830
2014-01-02 14:37:00+00:00 79.260 37.2700
2014-01-02 14:38:00+00:00 79.300 37.3000
2014-01-02 14:39:00+00:00 79.259 37.3000
2014-01-02 14:40:00+00:00 79.222 37.2800
2014-01-02 14:41:00+00:00 79.187 37.2500
2014-01-02 14:42:00+00:00 79.069 37.2500
2014-01-02 14:43:00+00:00 79.160 37.2300
2014-01-02 14:44:00+00:00 79.186 37.2325
2014-01-02 14:45:00+00:00 79.213 37.1950
2014-01-02 14:46:00+00:00 79.152 37.1601
2014-01-02 14:47:00+00:00 79.183 37.1800
2014-01-02 14:48:00+00:00 79.202 37.1900
2014-01-02 14:49:00+00:00 79.159 37.1550
2014-01-02 14:50:00+00:00 79.190 37.1700
2014-01-02 14:51:00+00:00 79.150 37.1600
2014-01-02 14:52:00+00:00 79.064 37.2150
2014-01-02 14:53:00+00:00 79.134 37.2500
2014-01-02 14:54:00+00:00 79.074 37.2400
2014-01-02 14:55:00+00:00 79.139 37.2350
2014-01-02 14:56:00+00:00 79.164 37.1800
2014-01-02 14:57:00+00:00 79.163 37.1600
2014-01-02 14:58:00+00:00 79.217 37.1800
2014-01-02 14:59:00+00:00 79.165 37.2001
2014-01-02 15:00:00+00:00 79.144 37.2501
2014-01-03 00:00:00+00:00 2014-01-03 14:31:00+00:00 78.937 37.1800
2014-01-03 14:32:00+00:00 78.702 37.1700
2014-01-03 14:33:00+00:00 78.756 37.1301
2014-01-03 14:34:00+00:00 78.552 37.1450
2014-01-03 14:35:00+00:00 78.573 37.1500
2014-01-03 14:36:00+00:00 78.616 37.1400
2014-01-03 14:37:00+00:00 78.693 37.1100
2014-01-03 14:38:00+00:00 78.630 37.1250
2014-01-03 14:39:00+00:00 78.589 37.0900
2014-01-03 14:40:00+00:00 78.543 37.1040

Because our DataFrame is Hierarchically-Indexed, we can query it by our groupby keys.


In [41]:
from pandas import Timestamp
# This gives us the first thirty minutes of January 3rd.
data.loc[Timestamp('2014-01-03', tz='UTC')]


Out[41]:
Security(24 [AAPL]) Security(5061 [MSFT])
2014-01-03 14:31:00+00:00 78.937 37.1800
2014-01-03 14:32:00+00:00 78.702 37.1700
2014-01-03 14:33:00+00:00 78.756 37.1301
2014-01-03 14:34:00+00:00 78.552 37.1450
2014-01-03 14:35:00+00:00 78.573 37.1500
2014-01-03 14:36:00+00:00 78.616 37.1400
2014-01-03 14:37:00+00:00 78.693 37.1100
2014-01-03 14:38:00+00:00 78.630 37.1250
2014-01-03 14:39:00+00:00 78.589 37.0900
2014-01-03 14:40:00+00:00 78.543 37.1040
2014-01-03 14:41:00+00:00 78.424 37.1000
2014-01-03 14:42:00+00:00 78.462 37.1400
2014-01-03 14:43:00+00:00 78.539 37.1500
2014-01-03 14:44:00+00:00 78.573 37.1600
2014-01-03 14:45:00+00:00 78.509 37.1120
2014-01-03 14:46:00+00:00 78.510 37.0600
2014-01-03 14:47:00+00:00 78.477 37.0600
2014-01-03 14:48:00+00:00 78.519 37.0600
2014-01-03 14:49:00+00:00 78.501 37.0200
2014-01-03 14:50:00+00:00 78.419 37.0100
2014-01-03 14:51:00+00:00 78.392 37.0400
2014-01-03 14:52:00+00:00 78.317 37.0450
2014-01-03 14:53:00+00:00 78.287 37.0200
2014-01-03 14:54:00+00:00 78.216 37.0300
2014-01-03 14:55:00+00:00 78.230 37.0200
2014-01-03 14:56:00+00:00 78.244 37.0150
2014-01-03 14:57:00+00:00 78.299 37.0200
2014-01-03 14:58:00+00:00 78.252 37.0301
2014-01-03 14:59:00+00:00 78.344 37.0500
2014-01-03 15:00:00+00:00 78.332 37.0372

If we want to query on the second layer of the index, we have to use .xs with a level argument instead of .loc.

Note that level=1 means the second level of the index, because the levels start at index 0.


In [42]:
data.xs(Timestamp('2014-01-03 14:58:00', tz='UTC'), level=1)


Out[42]:
Security(24 [AAPL]) Security(5061 [MSFT])
2014-01-03 00:00:00+00:00 78.252 37.0301

If we just want to work with the original index values, we can drop the extra level from our index.


In [43]:
data_copy = data.copy()
data_copy.index = data_copy.index.droplevel(0)
data_copy.head()


Out[43]:
Security(24 [AAPL]) Security(5061 [MSFT])
2014-01-02 14:31:00+00:00 79.446 37.340
2014-01-02 14:32:00+00:00 79.424 37.375
2014-01-02 14:33:00+00:00 79.490 37.260
2014-01-02 14:34:00+00:00 79.502 37.260
2014-01-02 14:35:00+00:00 79.252 37.280